Linear Multi-Resource Allocation with Semi-Bandit Feedback

نویسندگان

Tor Lattimore

Koby Crammer

Csaba Szepesvári

چکیده

We study an idealised sequential resource allocation problem. In each time step the learner chooses an allocation of several resource types between a number of tasks. Assigning more resources to a task increases the probability that it is completed. The problem is challenging because the alignment of the tasks to the resource types is unknown and the feedback is noisy. Our main contribution is the new setting and an algorithm with nearly-optimal regret analysis. Along the way we draw connections to the problem of minimising regret for stochastic linear bandits with heteroscedastic noise. We also present some new results for stochastic linear bandits on the hypercube that significantly improve on existing work, especially in the sparse case.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Lecture 9 : ( Semi - ) bandits and experts with linear costs ( part I )

In this lecture, we will study bandit problems with linear costs. In this setting, actions are represented by vectors in a low-dimensional real space. For simplicity, we will assume that all actions lie within a unit hypercube: a ∈ [0, 1]d. The action costs ct(a) are linear in the vector a, namely: ct(a) = a · vt for some weight vector vt ∈ Rd which is the same for all actions, but depends on t...

متن کامل

Optimum allocation of Iranian oil and gas resources using multi-objective linear programming and particle swarm optimization in resistive economy conditions

This research presents a model for optimal allocation of Iranian oil and gas resources in sanction condition based on stochastic linear multi-objective programming. The general policies of the resistive economy include expanding exports of gas, electricity, petrochemical and petroleum products, expanding the strategic oil and gas reserves, increasing added value through completing the petroleum...

متن کامل

Optimal Resource Allocation with Semi-Bandit Feedback

We study a sequential resource allocation problem involving a fixed number of recurring jobs. At each time-step the manager should distribute available resources among the jobs in order to maximise the expected number of completed jobs. Allocating more resources to a given job increases the probability that it completes, but with a cut-off. Specifically, we assume a linear model where the proba...

متن کامل

Online Linear Optimization with Sparsity Constraints

We study the problem of online linear optimization with sparsity constraints in the 1 semi-bandit setting. It can be seen as a marriage between two well-known problems: 2 the online linear optimization problem and the combinatorial bandit problem. For 3 this problem, we provide two algorithms which are efficient and achieve sublinear 4 regret bounds. Moreover, we extend our results to two gener...

متن کامل